Linguistically-aware attention for reducing the semantic gap in vision-language tasks

نویسندگان

چکیده

Attention models are widely used in Vision-language (V-L) tasks to perform the visual-textual correlation. Humans such a correlation with strong linguistic understanding of visual world. However, even best performing attention model V-L lacks high-level understanding, thus creating semantic gap between modalities. In this paper, we propose an mechanism - Linguistically-aware (LAT) that leverages object attributes obtained from generic detectors along pre-trained language reduce gap. LAT represents and textual modalities common linguistically-rich space, providing awareness process. We apply demonstrate effectiveness three tasks: Counting-VQA, VQA, Image captioning. novel counting-specific VQA predict intuitive count achieve state-of-the-art results on five datasets. Captioning, show nature by adapting it into various baselines consistently improving their performance.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

on the teacher-generated v.s. leaner-generated noticing-the-gap activities in language classes

abstract the purpose of this study is twofold: on the one hand, it is intended to see what kind of noticing-the –gap activity (teacher generated vs. learner generated) is more efficient in teaching l2 grammar in classroom language learning. on the other hand, it is an attempt to determine which approach of the noticing-the-gap- activity is more effective in the long- term retention of grammar...

the relationship among efl learners’ autonomy, first language essay writing tasks and second language essay writing tasks in task/content based language instruction

the ability of composing a coherent and extended piece of writing in second language is considered as a fundamental factor to convey information and ideas of learners through the academic issues. although learners may achieve a perfect academic writing skill through assigning the l2 tasks in content based instruction, but demonstration of their abilities may be related to their ability in l1 es...

15 صفحه اول

Towards Reducing the Social-Technical Gap in Location-Aware Computing

Along in their history, humans never ceased to create techniques and tools for observing their environment and locate themselves in the physical environment. This attests our necessity to be aware of who and what is where and when – a concept that we term location awareness. Nowadays, the democratization of mobile and wireless technologies increases people's awareness of their whereabouts. Howe...

متن کامل

An Efficient Algorithm for Reducing the Duality Gap in a Special Class of the Knapsack Problem

A special class of the knapsack problem is called the separable nonlinear knapsack problem. This problem has received considerable attention recently because of its numerous applications. Dynamic programming is one of the basic approaches for solving this problem. Unfortunately, the size of state-pace will dramatically increase and cause the dimensionality problem. In this paper, an efficient a...

متن کامل

An Efficient Algorithm for Reducing the Duality Gap in a Special Class of the Knapsack Problem

A special class of the knapsack problem is called the separable nonlinear knapsack problem. This problem has received considerable attention recently because of its numerous applications. Dynamic programming is one of the basic approaches for solving this problem. Unfortunately, the size of state-pace will dramatically increase and cause the dimensionality problem. In this paper, an efficient a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2021

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2020.107812